首页> 外文OA文献 >Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables
【2h】

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

机译:基于模型的惩罚聚类,具有特定于簇的对角线   协方差矩阵和分组变量

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

Clustering analysis is one of the most widely used statistical tools in manyemerging areas such as microarray data analysis. For microarray and otherhigh-dimensional data, the presence of many noise variables may mask underlyingclustering structures. Hence removing noise variables via variable selection isnecessary. For simultaneous variable selection and parameter estimation,existing penalized likelihood approaches in model-based clustering analysis allassume a common diagonal covariance matrix across clusters, which however maynot hold in practice. To analyze high-dimensional data, particularly those withrelatively low sample sizes, this article introduces a novel approach thatshrinks the variances together with means, in a more general situation withcluster-specific (diagonal) covariance matrices. Furthermore, selection ofgrouped variables via inclusion or exclusion of a group of variables altogetheris permitted by a specific form of penalty, which facilitates incorporatingsubject-matter knowledge, such as gene functions in clustering microarraysamples for disease subtype discovery. For implementation, EM algorithms arederived for parameter estimation, in which the M-steps clearly demonstrate theeffects of shrinkage and thresholding. Numerical examples, including anapplication to acute leukemia subtype discovery with microarray gene expressiondata, are provided to demonstrate the utility and advantage of the proposedmethod.
机译:聚类分析是许多新兴领域(例如微阵列数据分析)中使用最广泛的统计工具之一。对于微阵列和其他高维数据,许多噪声变量的存在可能掩盖了潜在的集群结构。因此,需要通过变量选择去除噪声变量。对于同时变量选择和参数估计,基于模型的聚类分析中现有的惩罚似然方法都假设跨聚类使用共同的对角协方差矩阵,但是在实践中可能不成立。为了分析高维数据,特别是那些样本量相对较小的数据,本文介绍了一种新颖的方法,该方法可以在更一般的情况下使用特定于群集(对角线)的协方差矩阵来缩小方差和均值。此外,通过惩罚的特定形式允许通过完全包括或排除一组变量来选择分组变量,这有助于将主题内容知识(例如基因功能)整合到用于疾病亚型发现的微阵列样品中。为了实现,推导了EM算法进行参数估计,其中M步清楚地表明了收缩和阈值的影响。数值示例,包括应用微阵列基因表达数据应用于急性白血病亚型的研究,证明了该方法的实用性和优势。

著录项

相似文献

  • 外文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号